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Abstract — We consider least squares estimators of carrier 
ptiase and amplitude from a noisy communications signal that 
contains both pilot signals, known to the receiver, and data 
signals, unknown to the receiver. We focus on signaling con- 
stellations that have symbols evenly distributed on the complex 
unit circle, i.e., A/-ary phase shift keying. We show, under 
reasonably mild conditions on the distribution of the noise, that 
the least squares estimator of carrier phase is strongly consistent 
and asymptotically normally distributed. However, the amplitude 
estimator is not consistent, but converges to a positive real 
number that is a function of the true carrier amplitude, the 
noise distribution and the size of the constellation. Our theoretical 
results can also be applied to the case where no pilot symbols 
exist, i.e., noncoherent detection. The results of Monte Carlo 
simulations are provided and these agree with the theoretical 
results. 

Index Terms — Detection, phase shift keying, asymptotic statis- 
tics 



I. Introduction 

In passband communication systems the transmitted signal 
typically undergoes time offset (delay), phase shift and attenu- 
ation (amplitude change). These effects must be compensated 
for at the receiver In this paper we assume that the time offset 
has been previously handled, and we focus on estimating the 
phase shift and attenuation. We consider signalling constella- 
tions that have symbols evenly distributed on the complex unit 
circle such as binary phase shift keying (BPSK), quaternary 
phase shift keying (QPSK) and M -ary phase shift keying (M- 
PSK). In this case, the transmitted symbols take the form, 



where j = and m is from the set {0, . . . , } 



2ir(M-l) ^ 
M 

and M > 2 is the size of the constellation. We assume that 
some of the transmitted symbols are pilot symbols known to 
the receiver and the remainder are information carrying data 
symbols with phase that is unknown to the receiver So, 




where P is the set of indices describing the position of the pilot 
symbols pi, and is a set of indices describing the position 
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of the data symbols di. The sets P and D are disjoint, i.e., 
Pn D = where is the empty set, and we let i = \PU D\ 
be the total number of symbols transmitted. 

We assume that time offset estimation has been performed 
and that L noisy M-PSK symbols are observed by the receiver 
The received signal after matched filtering is. 



Vi = aoSi + Wi, 



iePUD, 

J0O 



(1) 



where Wi is noise and ao = poe^ ° is a complex number 
representing both carrier phase 9o and amplitude po (by 
definition po is a positive real number). Our aim is to estimate 
flg from the noisy symbols {yi,i G P U D}. Complicating 
matters is that the data symbols {di,i e D} are not known to 
the receiver and must also be estimated. Estimation problems 
of this type have undergone extensive prior study ll2l-ll0ll. A 
practical approach is the least squares estimator, that is, the 
minimisers of the sum of squares function 



SS{a,{di,i £ D}) = ^ \yi ~ asi\' 



iePuD 



(2) 
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where denotes the magnitude of the complex number x. 
The least squares estimator is also the maximum likelihood es- 
timator under the assumption that the noise sequence {wi,i G 
Z} is additive white and Gaussian. However, as we show, the 
estimator works well under less stringent assumptions. 

The existing literature mostly considers what is called 
noncoherent detection where no pilot symbols exist (P = 0). 
In the noncoherent setting differential encoding is often used, 
and for this reason the estimation problem has been called 
multiple symbol differential detection. A popular approach 
is the so called non-data aided, sometimes also called non- 
decision directed, estimator based on the paper of Viterbi and 
Viterbi [2]. The idea is to 'strip' the modulation from the 
received signal by taking yi/\yi\ to the power of M. A function 
P : M H-)^ R is chosen and the estimator of the carrier phase 6*0 
is taken to be jj^A where Z denotes the complex argument 
and 

' F(I».I)(#t)"- (3) 



iePuD 



Various choices for P are suggested in |2] and a statistical 
analysis is presented. A caveat of this estimator is that it is not 
obvious how pilot symbols should be included. This problem 
does not occur with the least square estimator 

An important paper is by Mackenthun [7] who described 
an algorithm to compute the least squares estimator requiring 
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only 0{L\ogL) arithmetic operations. Sweldens ||8|] rediscov- 
ered Mackenthun's algorithm in 2001. Both Mackenthun and 
Swelden considered only the noncoherent setting, but we show 
in Section that Mackenthun's algorithm can be modified to 
include pilot symbols. Our model includes the noncoherent 
case by setting the number of pilot symbols to zero, that is, 
putting P = 0. 

In the literature it has been common to assume that the 
data symbols {di, i G D} are of primary interest and that the 
complex amplitude oq is a nuisance parameter The metric of 
performance is correspondingly the symbol error rate, or bit 
error rate. While estimating the symbols (or more precisely 
the transmitted bits) is ultimately the goal, we take the opposite 
point of view here. Our aim is to estimate oq, and we treat 
the unknown data symbols as nuisance parameters. This is 
motivated by the fact that in many modern communication 
systems the data symbols are coded. For this reason raw 
symbol error rate is not of interest at this stage. Instead, we 
desire an accurate estimator a of aq, so that the compensated 
received symbols a~^yi can be accurately modelled using 
an additive noise channel. The additive noise channel is a 
common assumption for subsequent receiver operations, such 
as decoding. The estimator a is also used in the computation 
of decoder metrics for modem decoders, and for interference 
cancellation in multiuser systems. Consequently, our metric 
of performance will not be symbol or bit error rate, but 
|a — ao|. It will be informative to consider the carrier phase 
and amplitude estimators separately, that is, if a = pe^^ where 
p is a positive real number, then we consider \{9 — do) A 
\p — Pq\. The function (•)^ denotes its argument taken 'modulo 
27r' into the interval [— 7r,7r). It will become apparent why 
{9 — 9o)Tr rather than 9 — 9q is the appropriate measure of 
error for the phase parameter 

It is possible to generalise the results we present here to 
allow data symbols with varying constellation size, i.e. varying 
AI. For example, one might give more importance to certain 
data symbols and use BPSK (M = 2) for these, but QPSK 
{M — 4) for other less important symbols. This is related to 
what is called unequal error protection in the literature ifTll 
1211 . To keep our ideas and notation focused we don't consider 
this further here. 

The paper is organised in the following way. Section 
extends Mackenthun's algorithm for the coherent case, when 
both pilot symbols and data symbols exist. Section |III] de- 
scribes properties of complex random variables that we need. 
Section |IV] states two theorems that describe the statistical 
properties of the least squares estimator of carrier phase 9 
and amplitude p. We show, under some reasonably general 
assumptions about the distribution of the noise wi, . . . ,wl, 
that {9 — 9o)t^ converges almost surely to zero and that 
(?o)Tr is asymptotically normally distributed as L oo. 
However, p is not a consistent estimator of the amplitude po- 
The asymptotic bias of p is small when the signal to noise 
ratio (SNR) is large, but the asymptotic bias is significant 
when the SNR is small. Sections [V] and [Vl] provide proofs 
of the theorems stated in Section |IV] In Section IVIII we 
consider the special case when the noise is Gaussian. In 
this case, our expressions for the asymptotic distribution can 



be simplified. Section IVIIII presents the results of Monte- 
Carlo simulations. These simulations agree with the derived 
asymptotic properties. 

II. Mackenthun's algorithm with pilots 

In this section we derive Mackentun's algorithm to compute 
the least squares estimator of the carrier phase and ampli- 
tude 101 • Mackenthun specifically considered the noncoherent 
setting, so we modify the algorithm to include the pilot sym- 
bols. For the purpose of analysing computational complexity, 
we will assume that the number of data symbols \D\ is 
proportional to the total number of symbols L, so that, for 
example, 0{L) = 0{\D\). In this case Mackentun's algorithm 
requires 0{L\ogL) arithmetic operations. This complexity 
arises from the need to sort a list of elements. 

Define the sum of squares function 



SS{a,{di,i <E D}) = ^ \yi - asi\^ 

iePuD 

= X! {\yi\'^ - aSiy* - a*s*y^ + aa*), (4) 



where * denotes complex conjugate. The minimiser of SS 
with respect to a as a function of {di, i £ D} is 



a{{d^,ieD})^j J2 J/'< = 7^ 



(5) 



i£PUD 



where L — \P\J D\ is the total number of symbols transmit- 
ted, and to simplify our notation we have put 



Y = 



iePUD 



ieP 



E 

ieD 



yid*i- 



Note that y is a function of the unknown data symbols {di,i E 
D} and we could write Y{{di,i G D}), but have chosen to 
suppress the argument {{di,i G D}) for notational brevity. 
Substituting j-Y for a into © we obtain 5*5* minimised with 
respect to a. 



SS{{d,,zeD}) = A-j\Y\\ 



(6) 



where A ~ SiePuD l^^l '^'^^ depend on the di. The 

least squares estimators of the data symbols are the minimisers 
of (|6]l. Observe that given candidate values for the data 
symbols, we can compute the corresponding SS{{di,i G D}) 
in 0{L) arithmetic operations. It turns out that there are at 
most M|_D| candidate values of the least squares estimator of 
the data symbols |7, 8]. 

To see this, let a — pe'^ where p is a nonnegative real. 
Now, 



SS{p,9,{d^,ieD}) = 



E 

ieP 



id 

peJ p, 



E 

iePuD 

2 



Vi - pe' 



E 



(7) 



We have slightly abused notation here by reusing SS. 
This should not cause confusion as SS{a,{di,i G D}), 
SS{p,9,{d„i G D}), and SS{{d^,i G D}) are easily 
told apart by their arguments. For given 9, the least squares 
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estimator of the ith data symbol di is given by minimising 
\yi — pe^^dil , that is. 



dm 



where = [/(e-^'^y,)] , (8) 



where Z(-) denotes the complex argument (or phase), and [•] 
rounds its argument to the nearest multiple of A word of 
caution, the notation [•] is often used to denote rounding to 
the nearest integer. This is not the case here. If the function 
round(-) takes its argument to the nearest integer then, 

N = If round (^x). 

Note that di{d) does not depend on p. As defined, Ui{9) is 
not strictly inside the set {0, ||, . . . , ^^^^^7"^}' 
not of consequence, as we intend its value to be considered 
equivalent modulo 2tt. With this in mind, 

u,{e) = [Zy, - 0^ 

which is equivalent to the definition from (O modulo 27r. 

We only require to consider 6 in the interval [0, 2tt). 
Consider how di{9) changes as varies from to 27r. Let 
bi = di{0) and let 

Zi = ^Vi - = Zy.i - [Zyi] 

be the phase difference between the received symbol yi and 
the hard decision resulting when 6^0, i.e. [Zyi] . Then, 



dm 



hi, 
he' 



j2iT/M 



< 61 < 



' A[ 



M 



' u„-j27Tk/M , ^(2fc-l) 

' ^ ~ TV/ 



7r(2M-l) 
M 



<0 <2Tr. 



(9) 



Let 



fi9)={dm,i&D} 



be a function mapping the interval [0, 27r) to a sequence of 
A/-PSK symbols indexed by the elements of D. Observe that 
f{6) is piecewise continuous. The subintervals of [0, 2tt) over 
which f{6) remains constant are determined by the values of 
{zi,i e D}. Let 

s^{f{e)\ee[o,2TT)} 

be the set of all sequences f{9) as 9 varies from to 2tt. If 9 
is the least squares estimator of the phase then S contains the 
sequence {di{9),i e D} corresponding to the least squares 
estimator of the data symbols, i.e., S contains the minimiser 
of Observe from (|9} that there are at most A/jZ?! sequences 
in 5*, because there are M distinct values of di{9) for each 
i ^ D ?& 9 varies from to 2tt. 

The sequences in S can be enumerated as follows. Let a 
denote the permutation of the indices in D such that ^^^(j-) are 
in ascending order, that is. 



whenever i < k where i,k e {0, 1, . . . , |-D| — 1}. It is 
convenient to define the indices into a to be taken modulo 
\D\, that is, if m is an integer not from {0, 1, ... , \D\ — 1} 
then we define a{m) — a{k) where k = m mod \D\ and 
k e {0, 1, . . . , \D\ — 1}. The first sequence in S is 

/o = /(O) = {(2,(0), z e D} = e D}. 

The next sequence /i is given by replacing the element 5^(0) 
in /o with 6^(0)6^-'^'^^*^. Given a sequence x we use xci to 
denote x with the zth element replaced by XiC^^^'^/*^. Using 
this notation, 

/i — /oecr(o) ■ 
The next sequence in 5 is correspondingly 

h = /oecr(0)ecr(l) = fiecr(l), 

and the fcth sequence is 

fk+l = fkea(k)- (11) 

In this way, all A/|Z3| sequences in S can be recursively 
enumerated. 

We want to find the fk G S corresponding to the minimiser 
of A naive approach would be to compute SS{fk) for 
each fc e {0, 1, . . . , M\D\ - 1}. Computing SS{fk) for any 
particular k requires 0{L) arithmetic operations. So, this naive 
approach would require 0{LAI\D\) — 0{L^) operations in 
total. Following Mackenthun |01, we show how SS{fk) can 
be computed recursively. 
Let, 

^ " (12) 



SS{fk)^A~-\Yk\ 



where. 



ieD 



Yk^Y{fk)^Y.y^P^ 

= B + J29ki, 

ieD 

where B = J2ieP ViPi independent of the data symbols, 
and fki denotes the ith symbol in fk, and for convenience, 
we put gki = yifl^. Letting gu be the sequence {gik,i e D} 
we have, from (fTTl i. that gk satisfies the recursive equation 

5fc+i = gkcli^k)^ 

where <?fee*(.j,^ indicates the sequence gk with the f7(fc)th 
element replaced by gkaik)^''^^^^^ ■ Now, 

Yo = B + Y,9oi 

can be computed in 0{L) operations, and 

Yi=B + Y,9u 
ieD 



S 1)50.(0) + E50i 



ieD 



(10) 



where 77 = e''^'^!^'^ — 1. In general, 

Yfc+i = Yfc + r\gkG(k)- 
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Input: {yuiePUD} 
1 for i e D do 



Y 

a ~ 



Zi = (p — u 

gi = yte~^^ 



8 Q= LI 

10 (T = sortindices(z) 

11 for fc to M\D\ 



1 do 



12 
13 
14 
15 
16 
17 



y = y + ?7.gCT(fc) 

5a(fe) = (f? + l).9<T(fc) 



if Q > Q then 

Q = Q 
a=lY 



18 return a 

Algorithm 1: Mackenthun's algorithm witli pilot symbols 



So, each Y^ can be computed from it predecessor Y^-i 
in a 0(1) arithmetic operations. Given Yfe, the value of 
SS{fk) can be computed in 0(1) operations using ( fT2] i. Let 
A; = argminS'S'(//j). The least squares estimator of aq is then 
computed according to (|5]l, 



(13) 



Pseudocode is given in Algorithm [T] Line [TO] contains the 
function sortindiccs that, given z = e 13}, returns the 

permutation a as described in ( fTOl i. The sortindicies function 
requires sorting |D| elements. This requires 0(L log L) oper- 
ations. The sortindicies function is the primary bottleneck in 
this algorithm when L is large. The loops on lines [T] and [TT] 
and the operations on lines |6] to lines [8] all require 0{L) or 
less operations. 

III. Circularly symmetric complex random 

VARIABLES 

Before describing the statistical properties of the least 
squares estimator, we first require some properties of complex 
valued random variables. A complex random variable W is 
said to be circularly symmetric if its phase ZW is independent 
of its magnitude \ W\ and if the distribution of Z.W is uniform 
on [0, 2tt). That is, if Z > and Q e [0, 27r) are real random 
variables such that Ze^® = W, then Q is uniformly distributed 
on [0, 2tt) and is independent of Z. If the probability density 
function (pdf) of Z is fz{z), then the joint pdf of and Z is 

ZTT 

Observe that for any real number 0, the pdf of W and e^'^W 
are the same, that is, the pdf is invariant to phase rotation. If 



^w\ 



IZ is finite, then W has zero mean because 



ze^'^ fz,e{z,e)dzde 





_ 1 

= — EZ 
27r 





27r 



zfz{z)dzd9 



-JUe = 0. 



If X and Y are real random variables equal to the real and 
imaginary parts of W — X + jY then the joint pdf of X and 
Y is 



fx,Y{x,y) 



We will have particular use of complex random variables of 
the form 1 + W where W is circularly symmetric. Let R> 
and $ G [0, 2tt) be real random variables satisfying, 

i?eJ* = 1 + W. 



The joint pdf of R and $ can be shown to be 



2r cos ( 



1 



1 



(14) 



2-K^r'^ — 2r cos( 

Since cos(/> has period 27r and is even on [— 7r,7r] it follows 
that /(r, 0) has period 27r and is even on [— tt, tt] with respect 
to 0. The mean of _Re^* is equal to one because the mean of 
W is zero. So, 



1, 



E3?(i?e^*) =Ei?cos($) 
where 5ft(-) denotes the real part, and 

E3(i?e^'*) = Ei?sin($) = 
where 5(-) denotes the imaginary part. 



(15) 



(16) 



IV. Statistical properties of the least squares 

ESTIMATOR 

In this section we describe the asymptotic properties of the 
least squares estimator In what follows we use (x)-,^ to denote 
X taken 'modulo 27r' into the interval [— 7r,7r), that is 

(a;)^ = a; - 27r round (^^) , 

where round(-) takes its argument to the nearest integer The 
direction of rounding for half-integers is not important so long 
as it is consistent. We have chosen to round up half-integers 
here. Similarly we use {x) to denote x taken 'modulo into 
the interval [--^, -^), that is 



If round (fx) 



The next two theorems describe the asymptotic properties of 
the least squares estimator These are the central results and 
the chief original contributions of this paper. 

Theorem 1. (Almost sure convergence) Let {wi} be a se- 
quence of independent and identically distributed, circularly 
symmetric complex random variables with W\Wi\^ finite, and 
let {yi, i E PU D} be given by ([T]l. Let d — pe^^ be the least 
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squares estimator of gq — poe^^". Put L — \PLI D\ and let 
\P\ and \D\ increase in such a way that 

■L-l — > p and J—J- d as L ^ oo. 

Let Ri > and $i € [0, 27r) be real random variables 
satisfying 



ii.e^*' = 1 



w. 



aoSi 



(17) 



and define the continuous function 

G{x) — phi{x) + dh2{x) where 

hi{x) = Ei?i cos(a; + $i), h2{x) = Ei?i cos(.t + $1). 

If p > and if G{x) is uniquely maximised at x — over the 
interval [— tt, tt) then 

1) {6 — 0o)ir almost surely as L 00, 

2) /} — > pqG{0) almost surely as L ^ 00. 

Theorem 2. (Asymptotic normality) Under the same condi- 
tions as Theorem\l\ let f{r, (j)) be the joint probability density 
function of Ri and $1, let 

5(0) = / rf{r,(j))dr 



and assume that = p + o{L ^^^) and = d + o{L ^Z^). 
Put \l = -{9-9o)7T = {Oo-0)tt and tjil = p-poG{0). If the 
function g is continuous at -p- for each k = 0, . . . , M — 

1, then the distribution of (-s/LAl, -s/LTTii) converges to the 
bivariate normal with zero mean and covariance matrix 



pUpB, 







as L ^ 00, where 



M-l 



7J = /i2(0)-2sin(^) g{^k 



M h 



^1 = Ei?Jsin2($i), = Ei?J sin^ ($1) , 

Bi = ERl cos2($i) - 1, B2 = ERl cos^ ($1) - /i^(0). 

The proof of Theorem [T| is in Section |V] and the proof 
of Theorem |2] is in Section |VT] Before giving the proofs 
we discuss the assumptions made by these theorems. The 
assumption that wi, . . . , are circularly symmetric can be 
relaxed, but this comes at the expense of making the theorem 
statements more complicated. If Wi is not circularly symmetric 
then the distribution of i?, and $i may depend on ao and 
also on the transmitted symbols {s,, i E PU D}. As a result 
the asymptotic variance described in Theorem |2] depends on 
ao and {si,i £ P U D}, rather than just po- The circularly 
symmetric assumption may not always hold in practice, but 
we feel it provides a sensible trade off between simplicity and 
generality. 

The assumption that E|wi|^ = E|wi|^ is finite implies that 
Ri has finite variance since ERf = 1+E\wi\ . This is required 
in Theorem ID so that the constants Ai, A2, Bi and B2 exist. 
We will also use the fact that Ri has finite variance to simplify 



the proof of Theorem [T] by use of Kolmogorov's strong law 
of large numbers [13]. 

The theorems place conditions on {6 — Oq)^^ rather than 
directly on 9 — 9o. This makes sense because the phases 60 and 
00 + 27rfc are equivalent for any integer k. So, for example, we 
expect the phases 0.997r and — 0.997r to be close together, the 
difference between them being |(— 0.997r — 0.997r)7r| = 0.027r, 
and not | - O.QQtt - 0.997r| = 1.987r. 

Theorem|2]requires the function g to be continuous at ^fc+ 
for each k = 0, . . . , Af — 1. This places mild restrictions on 
the distribution of the noise Wi. For example, the requirements 
are satisfied if the joint pdf of the real and imaginary parts 
of Wi is continuous, since in this case /(r, (j>) is continuous. 
Because /(r, </>) has period 2tt and is even on [— tt, tt] with 
respect to cf) it follows that g has period 2tt and is even on 

[-TT, tt]. 

A key assumption in Theorem [T] is that G{x) is uniquely 
maximised at a; = for x G [— tt, vr). This assumption asserts 
that G{x) < G{0) for all x G [-7r,7r) and that if {xi} is a 
sequence of numbers from [— 7r,7r) such that G{xi) — ?> G'(O) 
as I — >■ 00 then Xi — > as i — > cx). Although we will not 
prove it here, this assumption is not only sufficient, but also 
necessary, for if G{x) is uniquely maximised at some x 7^ 
then {6 — 9q)t; x almost surely as i — > 00, while if G{x) is 
not uniquely maximised then {O — Oq)-^ will not converge. One 
can check that this assumption holds when wi is circularly 
symmetric and normally distributed. We will not attempt to 
further classify those distributions for which the assumption 
holds here. 

Theorem [T] defines real numbers p and d to represent the 
proportion of pilot symbols and data symbols in the limit as L 
goes to infinity. For Theorem |2] we need the slightly stronger 
condition that 

^=p + o(L-i/2) and \El=d + o{L-'/^). 
L L 

This stronger condition is required to prove the asymptotic 

normality of y/LiriL. 

The next two sections give proofs of Theorems [T] and |2] 

The proofs make use of various lemmas, which are proved in 

the appendix. 

V. Proof of almost sure convergence (Theorem[T]) 

Substituting {di{9), i G D} from dU into ^ we obtain SS 
minimised with respect to the data symbols. 



SSip, 



A^ pZ{6)- pZ*{e)+Lp^, 



where 



z{e) = + E y^^-^^dn^), 

and Z*{9) is the conjugate of Z{9). Differentiating with 
respect to p and setting the resulting expression to zero gives 
the least squares estimator of po as a function of 9, 



m 



Z{9) + Z*{9) 
2L 



1 



^{Z{9)), 



(18) 



6 



where ) denotes the real part. Substituting this expression 
into SS{p, 9) gives SS minimised with respect to p and the 
data symbols, 

SS{e) ^ A~ y3iiZ{9)f. 

ij 

We again abuse notation by reusing SS, but this should not 
cause confusion as SS{p, 9) and SS{9) are easily told apart by 
their inputs. By definition the amplitude po and its estimator 
p are positive. However, p{9) — di{Z{9)) may take negative 
values for some 9 e [— 7r,7r). The least square estimator 9 
of 6*0 is the minimiser of SS{9) under the constraint p{9) = 
^iZ{9)) > 0. Equivalent^ 9 is the maximiser of ^iZ{9)) 
with no constraints required. 

We are thus interested in analysing the behaviour of the 
maximiser of di{Z{9)). Recalling the definition of Ri and $i 
from dnli, 

Ui = ttQSi + Wi 



QQSi 1 + 



and since 



p and 



M 

L 



d as, L ^ oo. 



Recalling the definition of di{9) and Ui{9) from (O, 

udO)=lZy,~9^ 

= [9o + + Zs, - 9] 

= [{9o-9)^ + ^, + Zs,^ (mod27r) 

= [A + + , 

where we put A ~ {9q ~ 9)^ and where, as in Section 
we consider Ui{9) equivalent modulo 2ti. Because d*{9) = 
e-J«x(«)^ it follows that, when i e D, 

y^e-^'d*{9) = poi?,e^(^+*-+^^'-LA+*.+z..l) 

= poi?.e^<^+*'> (19) 

since [a; + Zsi] — [x~\ + Zsi for all a; S M as a result of Zs^ 
being a multiple of Otherwise, when i G P, 

Now, 



Zi9) = poY.R^^'^^^"'^ + PoJ2^^ 



ieD 



Let 



Gl(A) = —n{Z{9)) 
PoL 



(20) 



and put Al = — (6'— 6'o)7r = (6*0 — 6*) w Since 6* is the maximiser 
of ^{Z{9)) it follows that is the maximiser of G'l(A). We 
will show that converges almost surely to zero as L — > oo. 
The proof of part 1 of Theorem [T] follows from this. 

Recall the functions G, hi and h2 defined in the statement 
of Theorem [T] Observe that 

IEGL(A) = M/i^(A) + M/i2(A) 



lim EGl(A) = G(A) = phi{X) + dh2{\). 

As is customary, let Vl be the sample space on which the 
random variables {wi] are defined. Let A be the subset of 
the sample space Vl on which G{Xl) G(0) as L oo. 
Lemma[T] shows that PrjA} = 1. Let A' be the subset of the 
sample space on which A^ ^ as L ^ oo. Because G{x) 
is uniquely maximised at a; = 0, it follows that G{Xl) — 
G(0) only if A^ as L ^ oo. So A C ^4' and therefore 
Pt{A'} > Ft{A} = 1. Part 1 of Theorem [T] follows. 

It remains to prove part 2 of the theorem regarding the 
convergence of the amplitude estimator p. From (fTSl l. 



p=jn(Z{9))^PoGL{XL). 



(21) 



Lemma [8] in the appendix shows that Gl(Al) converges 
almost surely to G(0) as L — > oo, and p consequently 
converges almost surely to pqG{0) as required. It remains to 
prove Lemmas [T] and |8] These are proved in Section |A] of the 
appendix. 

VI. Proof of asymptotic normality (Theorem|2]) 

We first prove the asymptotic normality of ^/LX^. Once 
this is done we will be able to prove the normality of ^/LrriL- 
Recall that Al is the maximiser of the function Gl defined 
in ( I20I 1. The proof is complicated by the fact that Gl is not 
differentiable everywhere due to the function (•) not being 
differentiable at multiples of This prevents the use of 
"standard approaches" to proving normality that are based 
on the mean value theorem il4l - ll9ll . However, Lemma |9] 
shows that the derivative G'^ does exist, and is equal to zero, 
at Al. Similar properties have been used by some of the 
present authors to analyse the behaviour of polynomial-phase 
estimators KOtl. Define the function 



Rl{\) = ^^i?,sin(A + $,;) + ^I]fi.sin(A + $,). (22) 



ieD 



Whenever Gl(A) is differentiable G'^(A) = i?L(A), and 
so i?L(AL) = G'i(AL) = by Lemma |9] Let Ql(A) = 
Ei?L(A) - Ei?L(0) and write 

= i?L(AL)-gL(AL) + gL(AL) 
= /L(i?L(AL) - gL(AL)) + VlQUXl). 
Lemma [TT] shows that 

VlQl{Xl) = VLXL{p + Hd + opil)) 

where op(l) denotes a sequence of random variables con- 
verging in probability to zero as L — > 00, and p, d and H 
are defined in the statement of Theorems [T] and |2] Lemma [16] 
shows that 

VI(i?L(AL) - Ql(Al)) = Op(l) + %/Ii?L(0). 

It follows from the three equations above that, 

= op(l) + VlRl{0) + vTXl{p + Hd + op{l)) 
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and rearranging gives, 



VIII. Simulations 



VIAl - op(l) 



LRLiO) 



p + Hd + op{l)' 



Lemma [TT] shows that the distribution of ^/LRl{0) converges 
to the normal with zero mean and variance pAi + dA2 where 
Ai and A2 are defined in the statement of Theorem |2] It 
follows that the distribution of \/L\ l converges to the normal 
with zero mean and variance 

pAi + dA2 
{p + Hdf ■ 

We now analyse the asymptotic distribution of ^/Ltjil- Let 
Tl{X) ^EGl{X). Using (EB, 



VLttil = VIpo(Gl(Al) - G(0)) 

= VLpo{Gl{Xl) - Tl[Xl)+Tl{\l) - G(0)). 

Lemma [TS] shows that 

VI(Gi(AL) - Tl{Xl)) = op(l) + Xl, 

where Xl = Vl{Gl{0) - Tl(0)). Lemma [19] shows that 

Vl{TlCXl) - G{0)) =op(l). 

It follows that VLtol — PoXl+op{1). Lemmal20l shows that 
the distribution of Xl converges to the normal with zero mean 
and variance pBi + dB2 as L —i' 00 where Bi and B2 are 
defined in the statement of Theorem|2] Thus, the distribution of 
^/LmL converges to the normal with zero mean and variance 
Pq{pBi + dB2) as required. Because Xl does not depend on 
Ai, it follows that coy{Xl, \/LXl) ~ 0, and so, 

cov{VLmL, VlXl) — > cov{poXl, VlXl) = 

as L — 00. The lemmas that we have used are proved in 
Section |B] of the appendix. 



VII. The Gaussian noise case 

Let the noise sequence {wi} be complex Gaussian with 
independent real and imaginary parts having zero mean and 
variance cr^. The joint density function of the real and imagi- 
nary parts is 



1 



re 2, 



Theorems [T] and |2] hold, and since the distribution of wi is 
circularly symmetric, the distribution of i?ie-'*i is identical 
to the distribution of 1 + —wi. It can be shown that 



po 



*(V2kcos( 



2tt 



(2 + 



KCOS 



where n — and ^'(i) 



^/27^ J-00 



lative density function of the standard normal. The value of 
Ai,A2, Bi and B2 can be efficiently computed by numerical 
integration using this formula. 



We present the results of Monte-Carlo simulations with 
the least squares estimator In all simulations the noise sam- 
ples wi, . . . , wl are independent and identically distributed 
circularly symmetric and Gaussian with real and imaginary 
parts having variance a^. Under these conditions the least 
squares estimator is also the maximum likelihood estimator. 

Simulations are run with M = 2, 4, 8 (BPSK, QPSK, 8-PSK) 

2 

and with signal to noise ratio SNR = between -20 dB 
and 20 dB in steps of 1 dB. The amplitude po — 1 and 6*0 
is uniformly distributed on [— 7r,7r). For each value of SNR, 
T — 5000 replications are performed to obtain T estimates 
pi,..., PT and §i,...,§T. 

Figures [T] |2] and |3] show the sample mean square error 
(MSB) of the phase estimator when M = 2,4,8 with 
L — 4096 and for varying proportions of pilots symbols 
\P\ = 0, j^, J, When \P\ ^ (i.e. coherent detection) 
the mean square error is computed as ^Y^J^i{§i — ^'o)^- 
Otherwise, when |P| = the mean square error is computed 
as ~ ^s in (it]. The dots, squares, circles 

and crosses are the results of Monte-Carlo simulations with 
the least square estimator The solid lines are the estimator 
MSEs predicted by Theorem |2] The prediction is made by 
dividing the asymptotic co variance matrix by L. The theorem 
accurately predicts the behaviour of the phase estimator when 
L is sufficiently large. As the SNR decreases the variance of 
the phase estimator approaches that of the uniform distribution 
on [— TT.vr) when |P| 7^ and the uniform distribution on 
hil'^) when \P\ = |1]. Theorem |2] does not model 
this behaviour in the sense that for any fixed L there exist 
sufficiently small values of SNR for which Theorem |2] does 
not produce accurate predictions of the MSB. As the SNR 
increases the variance of the estimators converge to that of 
the estimator where all symbols are pilots, i.e. \P\ — L. 

Figures [T] |2] and [3] also display the sample MSB of the 
noncoherent phase estimator of Viterbi and Viterbi [jH de- 
scribed by (|3]l. This estimator requires selection of a function 
F that transforms the amplitude of each sample prior to 
the final estimation step. Viterbi and Viterbi propose several 
viable alternatives, from which we have chosen F{x) = 1. 
The Viterbi and Viterbi estimator is only applicable in the 
noncoherent setting, i.e. when \P\ = 0. The sample MSB of 
the least squares estimator (when \P\ = 0) and the Viterbi 
and Viterbi estimator is similar. The least squares estimator 
appears slightly more accurate for some values of SNR. 

Figures H] |5] and |6] show the variance of the amplitude esti- 
mator when M = 2, 4, 8 and with L = 32, 256, 2048 and when 
the number of pilots symbols is \P\ = 0, ^ , L. The solid lines 
are the variance predicted by Theorem |2] The dots and crosses 
show the results of Monte-Carlo simulations. Bach point is 
computed as the unbiased error ^ J2i=i {Pi ~PoG(0)) . This 
requires G(0) to be known. In practice G(0) may not be known 
at the receiver, so Figures |4] |5] and |6] serve to validate the 
correctness of our asymptotic theory, rather than to suggest 
the practical performance of the amplitude estimator When 
SNR is large G(0) is close to 1 and the bias of the amplitude 
estimator is small. However, G(G) grows without bound as the 
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variance of the noise increases, so the bias is significant when 
SNR is small. 

Figure |7] shows the MSB of the phase estimator when 
M = 4 and L = 32, 256, 2048 and the number of pilots is 
|P| — ^,L. The figure depicts an interesting phenomenon. 
When L = 2048 and \P\ = ^ = 256 the number of 
pilots symbols is the same as when L = \P\ = 256. When 
the SNR is small (approximately less than OdB) the least 
squares estimator using the 256 pilots symbols and also the 
2048 — 256 = 1792 data symbols performs worse than the 
estimator that uses only the 256 pilots symbols. A similar 
phenomenon occurs when L ~ 256 and \P\ — ^ — 32. This 
behaviour suggests modifying the objective function to give 
the pilots symbols more importance when the SNR is low. For 
example, rather than minimise (l2) we could instead minimise 
a weighted version of it, 

SSi3{a,{di,i e D}) = ^\yi- asi|^ + f3^\yi- adi\'^, 

where the weight /3 would be small when SNR is small and 
near 1 when SNR is large. Computing the d that minimises 
SS/j can be achieved with only a minor modification to algo- 
rithm[T] Line|5]is modified to gi = f3yie~^^ and lines 171 and [TtI 
are modified to a = |p|^^^| j)| Y- For fixed /3 the asymptotic 
properties of this weighted estimator could be derived using 
the techniques we have developed in Sections IIVI IV] and [Vl] 
This would enable a rigorous theory for selection of /3 at the 
receiver One caveat is that the receiver would require knowl- 
edge about the noise distribution in order to advantageously 
choose (3. We do not investigate this further here. 

IX. Conclusion 

We considered least squares estimators of carrier phase and 
amplitude from noisy communications signals that contain 
both pilot signals, known to the receiver, and data signals, 
unknown to the receiver We focused on il/-ary phase shift 
keying constellations. The least squares estimator can be 
computed in 0{L\ogL) operations using a modification of 
an algorithm due to Mackenthun |7], and is the maximum 
likelihood estimator in the case that the noise is additive white 
and Gaussian. 

We showed, under some reasonably general conditions on 
the distribution of the noise, that the phase estimator 6 is 
strongly consistent and asymptotically normally distributed. 
However, the amplitude estimator po is biased, and converges 
to G{0)pq. This bias is large when the signal to noise ratio 
is small. It would be interesting to investigate methods for 
correcting this bias. A method for estimating G(0) at the 
receiver appears to be required. 

Monte Carlo simulations were used to assess the perfor- 
mance of the least squares estimator and also to validate 
our asymptotic theory. Interestingly, when the SNR is small, 
it is counterproductive to use the data symbols to estimate 
the phase (Figure |7]i. This suggests the use of a weighted 
objective function, which would be an interesting topic for 
future research. 
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Fig. 1. Phase error versus SNR for BPSK with L = 4096. 




Fig. 2. Phase error versus SNR for QPSK with L = 4096. 
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Fig. 3. Phase eiTor versus SNR for 8-PSK with L = 4096. 
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Fig. 4. Unbiased amplitude error versus SNR for BPSK. 
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Fig. 5. Unbiased amplitude error versus SNR for QPSK. 
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Fig. 6. Unbiased amplitude error versus SNR for 8-PSK. 
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Fig. 7. Phase error versus SNR for QPSK. 
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Appendix 

A. Lemmas required for the proof of almost sure convergence 
(TheoremU^ 

Lemma 1. G{Xl) —J- G(0) almost surely as L oo. 
Proof: Since G{x) is uniquely maximised at a; = 0, 

0<G(0)-G(Al), 
and since Al is the maximiser of Gl{x), 
Q<Gl(Xl)-Gl{0). 

Thus, 

0< G(0)-G(Al) 

< G(0) - G(Al) + Gl(\l) - Gl(0) 

< |G(0) - Gi(0)| + \Gl{Xl) - G(Xl)\ 
<2 sup |Gl(A) - G(A)|, 

and the last line converges almost surely to zero by Lemma |2] 

■ 

Lemma 2. sup;^gj_^ |Gl(A) — G(A)| — >■ almost surely 
as L oo. 

Proof: Put Tl{X) = EGl(A) and write 

sup |Gl(A)-G(A)| 

AG[-7r,Tr) 

= sup |GL(A)-ri(A) + ri(A)-G(A)| 

Ae[-u-,7r) 

< sup |Gi(A)-ri(A)|+ sup |Ti(A)-G(A)|. 

Ae[-7r,7r) Ae[-7r,7r) 

Now, 

nw ~ G(A) = (M _ p)h,{\) + - d)h2{\) 

= o{l)hiiX) + o{l)h2iX) 



Since 

\hi{X)\ = |Ei?iCos(A + $i)| < ERi, 

and 

\h2{X)\ = \ERi cos (A + $1) I < ERi 
for all A e [— TT, tt), it follows that 

sup |rL(A) - G(A)| < o(l)Ei?i ^ 

Ae[-ir,7r) 

as L 00. Lemma [3] shows that 

sup |GL(A)-ri(A)| ^0 

Ae[-7r,7r) 

almost surely as i — > cxi. ■ 
Lemma 3. Put Tl{\) = EGl(A). Then 

sup |GL(A)-Ti(A)| ^0 

A6[-7r, it) 

almost surely as L 00. 

Proof: Put Dl{X) = GlW - Il(A) and let 

An = if(n-l)-7r, n=l,...,N 

be N points uniformly spaced on the interval [— tt, tt). Let 
Ln — [A„, A„ + ^) and observe that Li, . . . , Ln partition 
[— TT, tt). Now 

sup \Dl{X)\ 

Ae[-7r,7r) 

= sup sup \Dl{X) ~ DL{Xn) + DL{Xn)\ 
n=l,...,N AeL„ 

< Ul + Vl, 

where 

Ul = sup |i:'L(A„)| and 

n=l,...,N 

Vl= sup sn-p\DL{X)-DL{Xn)\. 

n=l,...,N X<£Ln 

Lemma |4] shows that for any N and e > 0, 

Pr I lim Ul> ^ 0, 

LL— >-oo J 

that is, Ul —>■ almost surely as L 00. Lemma |5] shows 
that for any e > 0, 

Pr I lim VL>e + ^ErA = 0. 

LL— foo J 

If we choose N large enough that 47rEi?i < eN then 
Pri lim sup |I?l(A)| > 3e \ 

< Pr( lim (Ul + Vl) > 3e| 

< Pr I lim iUL + VL)> e + e+ ^Ei?,| 

< Pr l^lim L/l > e| + Pr j^lim Vl > e + ^Ei?,| 
= 0. 

Thus sup;^g[_jr^7r) \^l{X)\ — )■ almost surely as L — )■ oo. ■ 
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Lemma 4. For any N > 0, Ul — >■ almost surely as L ^ oo 
where Ul is defined in the proof of Lemma^ 



Proof: Put 
SO that 



i?i cos(A + ieP 
Ri cos(A + i £ D 



Dl{\) = Gl{\)~Tl{\)^\ {Z^{\)-EZ,{\)). 



iePuD 



Now Zi(A„), . . . , Zl(A„) are independent with finite variance 
(because KRf is finite), so for each n = 1, . . . ,N, 



\DLiXn)\ = 



i (Z.(A„)-EZ,(A„)) 



iePuD 







almost su rely as L — > cx) by Kolmogorov's strong law of large 
numbers 11 1311. Thus 



N 



Ul = sup \DLiXn)\ < Y l^LiK)] ^ 



oo. 



= 0. 



n=l,...,N 

almost surely to zero as L 
Lemma 5. For any e > 0, 

Prj^lini^FL > e + ^Ei?,} 

Proof: Observe that 

\Dl{X) - DL{Xn)\ 

= \Gl{X) - Tl{X) - Gi(A„) + Ti(A„)| 

< |Gi(A) - Gl(A„)| + |EGi(A) - EGl(A„)| 

< |Gl(A) - Gl(A„)| +E|Gl(A) - Gl(A„)|, 

the last line following from Jensen's inequality. Put 



Gl = sup sup |Gl(A) - Gl(A„)|, 

n=l,...,Af AeL„ 



SO that 



Vl= sup sup |I?L(A)-i?L(A„)| 
n=l,...,N XeL„ 



<Cl 
<Cl+ECl, 



sup sup E|Gl(A) -Gl(A„)| 

n=l,...,N AeL„ 



where the last line follows because supE| 



< EsupI 



Lemma shows that EGl < ^ERi and also that 



Pr \ lim Gr > e 



= 0. 



Thus, 



Prj^lim^T/i > e+^Ei?i} 

< Pr f lim {Cl + EGl) > e + if Ei?i| 



< Pr I lim Cl> e 



Lemma 6. TTze following statements hold: 



1) EGi < ^Ei?! /or flZ/ positive integers L, 

2) /or any e > 0, Pr {limL^^o G^ > e + ^Ei?i} 0. 

froq/ If A e L„, then A = A„ + (5 with (5 < 22:^ and 
from Lemma |7] 



|cos(A + $,) - cos(A„ + $,)!< f^, 
|cos(A + $,) - cos(A„ + $,)| < 2f . 

Because these results do not depend on n, 



and 



27r 



sup sup |Z,(A) - Z,(A„)| < i?, 

Ti=l,...,Af AeL„ 



for alH = F U £). Also 

Gl = 

1 



sup sup 

n=l,...,Af AeL„ 



i ^ Z,(A)-Z,(A„) 



< 



L . _4r^^ n=i,...,Ar Aei 



sup sup |Zi(A) - Zi(A„)| 



iePuD ' 



2tt ^ . 

iePuD 



Thus, 



and the first statement holds. Now, 

^ y R^^^ER, 

NT. ^ ' N ^ 



iePuD 

almost surely as L — > oo by the strong law of large numbers, 
and so, for any e > 0, 



Pr I lim Gl > e + ■ 

f 27r 
< Pr < lim 

- L->oo NL 



fEi?i} 

J2 R->^ 



N 



ERi 



iePuD 



Lemma 7. Let x and 5 be real numbers. Then 

|cos(a:; + (5) — cos(a;)| < and 
|cos {x + 5) — cos {x) \ < \S\. 

Proof: Both cos(x) and cos (x) are Lipschitz continuous 
functions from M to M with constant K — 1. That is, for any 
X and y in M, 

|cos(?;) — cos(a;)| < K\x — y\ — \x — y\, and 
|cos {y) — cos {x) \ < K\x — y\ — \x — y\. 

The lemma follows by putting y = x + 5. ■ 

Lemma 8. Gl{Xl) G(0) almost surely as L —>■ oo. 

Proof By the triangle inequality, 

|Gl(Al) - G(0)| < \GlCXl) - gCXl)\ + |G(Al) - G(0)|. 

Now |Gl(Al) — G(Al)| -> as i — > cx) as a result of 
Lemma |2] and |G(Ai) — G(0)| almost surely as L — > oo 
because G is continuous and A^ — >■ almost surely as L — >■ oo. 
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B. Lemmas required for the proof of asymptotic normality 
(Theorem |2l) 

Lemma 9. The derivative of Gl exists, and is equal to zero, 
at Xl. That is, 

dC 

Proof: Observe that 

Gl{\) - 7 ^ cos(A + $,) + 1 ^ i?, cos(A + $,:) 



Now = a = pe'^ from (|5]l and using ( fT9] l. 



ieD 



is differentiable everywhere except when (A + $i) — —jj for 
any i E D with i?^ > 0. Let qi be the smallest number from 
the interval [— , 0] such that 



sin(gi) > — 



Po\v\ Ri 



ALpsin{Tr/M) 

where rj = e^i^'^/^'^ — i. Observe that qi < when Ri > 0. 
Lemma [TOl shows that 

|(Al + $,)| < ^ + 9^ 

for all i e D. Thus, (A^ + $,) 7^ for i e Z? such 

that Ri > and therefore Gl is differentiable at Al- That 
G']^{Xl) = follows since Al is a maximiser of Gl- ■ 

Lemma 10. Let qi be defined as in Lemma |9] Then 
|(Al + $«)| < JJ + q^for all I ED. 

Proof: Recall that {d^ — di{0),i e D} defined in (O are 
the minimisers of the function 

SS{{d,,ie D}) = A- j\Y{{d,,t e D})\\ 

defined in (|6]l. The proof now proceeds by contradiction. 
Assume that 

(Al +^k)>^ + qk (23) 

for some k E D. Recalling the notation Cfc defined in 
Section ini put = die^- We will show that 

SS{{n,teD})<SS{{d,,ieD}), 

violating the fact that {di,i £ D} are minimisers of SS. First 
observe that, 

Y{{n, ieD}) = J2 y^P'^ + J2 = ^ + 



ieP 



ieD 



where r] = e-^^Tr/Af _ 1 and f = Y{{d„ i e £»}). Now, 
SS{{n,ieD}) 

= A-^\Y{{n,teD}f 

^A-^\Y + rjyJlf 

= A-^\Yf-^^{r,Y*yJl)-^M' 
= SS{{d,,ieD})~C, 



so that 



-Y*yJl^ppoRke'''^-+'''\ 



1 



G = 2ppoRk'fi (r7e^<^-+*^'>) + j^M^plK- (24) 



Let w = (Al + $fc) - fr so that 



M 



Jtt/M jv 



= (e-^^-/^^^ - l)e 
= ~2jsin(^)e^"", 



3?(77e^'<^^+*'=>) =2sin(^)sin(w). 



and 



Because we assumed ( 1231 ). it follows that > v > qk and, 
from the definition of qk, 

— .-r ^ . , — 7TTT < sm i; < 0. 
4L/5sin(7r/M) ^ ' 

Substituting this into ( l24b gives G > 0, but then 

SS[{r,,i£D})<SS{{d,,i£B}), 

violating the fact that {di,i £ D} are minimisers of 55. 
So (|23]l is false. 

To show that (Al + > — -p^ — f?fe we use contradiction 
again. Assume that (Al + $fe) < — — qk- Recalling the 
notation defined in Section HI] put = diel. Now an 
analogous argument can be used to show that SS{{ri,i e 
D}) < SS{{d„ i e D}) again. ■ 

Lemma 11. Let Ql{\) = ERl{X) - ERl{0) where the 
function Rl is defined in ( 1221 ). We have 

^^QlCXl) = VLXL{p + Hd + opil)) 

where p, d and H are defined in the statements of Theo- 
rems\l]and\2l 

Proof: We have 

Ql(A) =Ei?L(A) -Ei?L(0) = ^fci(A) + ^fc2(A) 
where 

fci(A) = Ei?i(sin(A + $1) - sin($i)), and 

fc2(A) =Ei?i(sin(A + $i) -sin($i)). (25) 

Lemma [12] shows that A;i(Al) = Al(1 + op(l)) and 
Lemma [13] shows that fc2(AL) = Xl(^H + op{l)^ and so 

Ql{Xl) = ^Al(1 + op(l)) + ^AL(i/ + op(l)) 

= Al(M + M^ + o,(1)) 
= XL{p + dH + op{l)), 

since — > p and — > d as L — > 00. The lemma follows 



where 



C^j^{vY*ykdl)+^\vyk\^. 



by multiplying both sides of the above equation by vL 
Lemma 12. Put 

qiiX) + jkiiX) = E[i?ie^(^+*i) - i?ie^*i]. 
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We have q\(X) = AiOp(l) and A:i(Al) = Al(1 + op(l)). 
Proof: We have 

gi(A) + 3ki{X) = {eJ^ - l)Ei?ie^*i = e^'^ - 1 

since the mean of i?ie^*i is 1. By a first order expansion 
about A = we obtain 

<Zi(A)+j-fci(A) =A(j+0(A)) 

Since Al converges almost surely to zero as L — > oo it follows 
that 0{Xl) = op(l). Thus 

qiih) + jkiiXL) ^ + op{l)) 

and the lemma follow by taking real and imaginary parts. ■ 

Lemma 13. Put 

92(A) +jfc2(A) = E[i?ie^<^+*i> - i?ie^'<*^>]. 

IVe teve q2{XL) = Alop(1) ant/ fc2(AL) = AL(i/ + op(l)) 
w/iere iJ defined in the statement of Theorem |2] 

Proof: Because A^ — > almost surely as i — > 00, it is 
only the behaviour of (72(A) and fc2(A) around A = that is 
relevant. We will examine (72(A) and fc2(A) for < A < 
An analogous argument follows when — -p- < A < 0. To keep 
our notation clean put 

27r TT 

Wk — K -\ 

M M 

with fc e Z. When $1 G [i/'fc-i, V^fc - A), 

gi(A+*i) _ gj(<I>i) ^ Qo{>-+-^i-^k) _ gj(<I'i-^fc) 

and when $1 e [^fe — A, V'fc), 
gi(A+*i)_gj(*i) 

Thus, when $1 e [-0^-1, V'fc), 



where 



Now 



Xfe(4'i,A) = 



fl, $ie[V'fc-A>fc) 
1 0, otherwise. 



92(A) +iA:2(A) = E[i?ieJ'<^+*i> - i?ie^<*i>] 

= (e^^-l)/i2(0) + eJ'^S(A) (26) 

since Ei?ieJ<*i> = ^.2(0) + Ei?i sin ($1) = ^2(0) as a result 
of Lemma [15] and where 

B{\) = {er^^ - l)Ei?ie^<*i>x(*i, A) (27) 



and x($i, A) = EfcezXfe($i,A).^ 

Now e^^^ = 1 + £?p(l) and e-?^^ - 1 = AL(j + (3p(l)) by 
the argument in Lemma [12] Also 

bCXl) = -Al |^2jsin(^) ^(V'fe) + op(l)^ 
by Lemma [14] Combining these results into (i26] l we obtain 

g2(A)+jfc2(AL) 

= Al h7i2(0) - 2jsin(^) ^ g{^k) + opilU 

\ k=0 ) 

= XL{jH + opil)) 
and the lemma follows by taking real and imaginary parts. ■ 
Lemma 14. With B{X) defined in dZTb we have 

B{Xl) = -Al ( 2j sin(^) ^ ^(^fe) + (5p(l) ) 
\ fc=o / 

Proof: Put ^(A) = Ei?ie-'<*i>x(^'i, A). Recalling that 
/(r, (f)) is the joint pdf of i?i and $1 we have 

^(A)= / / r/(r,,/))e^"<^>x(0,A)(/r(/(/. 



^0 

2ir 



= E/ 5('/')e^'<^>Xfc(</',A)d(^ 

M-l 

= E / 5('^)e^'<*^d(^, 
k=o J-'Pk-^ 

the last line because the Xki't': A) terms inside the integral are 
zero for all (f) e [0, 27r] when k ^ {0, . . . , M - 1}. Observe 
that ((/)) — > -p- as (/) approaches i/ife from below. Because (/(V'fe) 
is continuous at ■0^ for each fc = 0, . . . , Af — 1 (by assumption 
in Theorem O we have 

^ M-l 

-A(A) ^ ^ 5(V'fc)e-''^ 



fc=0 



as A approaches zero from above. We are only interested in the 
limit from above because we are working under the assumption 
that < A < (see the proof of Lemma [T3]l. The analogous 
argument when —j^<X<Q would involve limits as A 
approaches zero from below. Thus 

A(Al) = Al 6^17 ^(^fc) + 

\ fe=o / 

and the lemma follows since (e m — Ije^ m = —2j sin(-^) 

and B{X) = (e^^ 17 - l)^(A). ■ 

Lemma 15. ERi sin (<i>i) = 0. 

Proof: Recalhng that f{r,4>) is the joint pdf of Ri and 
<i>i we have 

Ei?isin($i)= / / r sin {4>) f{r,(l))drd4> 



"'0 

sin {(t)) g{(j))d(j). 
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The proof is immediate since g{(p) is even and sin (0) is odd. 



Lemma 16. Let Ql{\) = ERl{X) - Ei?L(0) where the 
function Rl is defined in (I22l l. Then, 



VL{RLi\L) - Ql(Al)) = VLRl{0) + op(l). 
Proof: Write 
%/I(i?L(AL) - Ql(Al)) - VKl(Al) + \/Ii?L(0) 

where 

M^l(A) = VI(i?L(A)-QL(A)-i?L(0)) (28) 
is what is called an empirical process indexed by A 



Techniques from this literature can be used to show that for 
any S > and i/ > Q, there exists e > such that 

Pri sup \Wl{X)\ >s \ <iy 

i|A|<£ J 

for all positive integers L. This type of result is typically 
called tightness or asymptotic continuity 1118 , [H UM- We 
omit the proof which follows in a straightforward, but lengthy 
manner using an argument called symmetrisation followed by 
an argument called chaining |18, 19]. 

Since Al converges almost surely to zero, it follows that 
for any e > 0, 



lim Pr 

L— J-oo 



{|AL|>e} 







and therefore, for any ly > 0, Pr{|AL| > e} < ly for all 
sufficiently large L. Now 

Pr {\Wl{Xl)\ > 5} 

= Pr{|T4^L(AL)| ><5and|AL| < e} 

+ Pr{\WL{XL)\ ><5and |Al| > e} 

< Prjsup |T^l(A)| > ^1 +Pr{|AL| > e} 

for all sufficiently large L. Since and S can be chosen arbi- 
trarily small, it follows that Wl(Al) converges in probability 
to zero as — > 00. ■ 

Lemma 17. The distribution 0/ a/Z^^lIO) converges to the 
normal with zero mean and variance pAi + dA2 as L — > 00. 

Proof: Observe that VLRl{0) ^ Cl + Dl where 



= ^^i?,sin($,), = ^^i?.sin($,). 

From the standard central limit theorem the distribution of Cl 
converges to the normal with mean y/pERi sin($i) = as a 
result of (O, and variance 

pAi =pEi?^sin^($i). 



Similarly, the distribution of D]^ converges to the normal with 
mean VdEi?i sin ($1) = as a result of Lemma [H] and 
variance 

dA2 = dERl sin^ ($1) . 
The lemma holds because Cl and are independent. ■ 
Lemma 18. Let Tl{X) = EGl(A). We have 

Vl{Gl{Xl) - Tl{Xl)) =Xl + op(l), 
where Xl = Vl{Gl{0) - Tl{0)). 
Proof: Write 

Vl{Gl{Xl) - Tl{Xl)) = YlCXl) + Xl 

where 

Yl{X) = y/L{GL{X) - mX)) - Xl. (29) 



is an empirical process indexed by A, similar to Wl from i 
As with Wl results from the literature on empirical processes 
can be used to show that for any 5 > and ly > Q, there exists 
e > such that 

Pr J sup \Yl{X)\ >6 \ <iy. 

[|A|<e J 

The proof now follows by an argument analogous to that in 
Lemma [16] ■ 

Lemma 19. Vl{TlCXl) - G(0)) = op(l). 

Proof: The argument is similar to that used in Lemma [TT] 
First observe that 

Tl(A) = l^Ei?i cos(A + $1) + l^Ei?i cos (A + $1) 

= {p + o{L-^^^))ERi cos(A + $1) 

+ {d + o{L-'^^'^))ERi cos (A + $1) , 

and because ^ =p + o{L-^^^) and ^ = d + o{L-^/^) (by 
assumption in Theorem |2]l, we have 

VI(Tl(A) - G(0)) = VItoi(A) + VIdg2(A) + o(l), 

where 

gi(A) = Ei?i(cos(A + $1) - cos($i)) and 

92 (A) =Ei?i(cos(A + $i) -cos($i)). (30) 

Lemma [T2] shows that qi{XL) = Xlop{1) and Lemma [T3] 
shows that g2(AL) — Xlop{1) and so 

y/L{TL{X) - G(0)) - \/ZAlop(1) + o(l). 

The lemma follows since VlXl converges in distribution. ■ 

Lemma 20. The distribution of 

Xl = %/I(Gt(0) - TlIO)) - %/I(Gi(0) - EGl(O)) 

converges, as L ^ 00, to the normal with zero mean and 
covariance pBi + dB2 ■ 

Proof: Observe that = G^ + D'^ where 
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d'l = 4= Y.^R^ COS {<i>,) ~ h^m, 

where hi and /i2 are defined in the statement of Theorem [T] 
From the standard central limit theorem the distribution of C£ 
converges to the normal with zero mean and variance 

pBi ^ pERlcos^{<^i) - ph\{Q) =pERl cos^{<i>i)-p 

since hi{0) = 1. Similarly the distribution of converges 
to the normal with zero mean and variance 

dB2 = dERl cos^ - dhl{0). 

The lemma follows since and D'j^ are independent. ■ 



